Fix some issues with dynamic algorithm selection in coll/tuned#8198
Fix some issues with dynamic algorithm selection in coll/tuned#8198jsquyres merged 5 commits intoopen-mpi:v4.1.xfrom
Conversation
The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 06f605c)
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 7261255)
…d fall back to linear Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs Allreduce: rabenseifner expects N_elem >= pow2 nearest to N_procs In all cases, the implementations will fall back to a linear implementation, which will most likely yield the worst performance (noted for 4B bcast on 128 ranks) Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 04d198f)
|
I added a commit that removes the selection of linear algorithms in allreduce and allgather. In my measurements the latency for these ranges is higher than necessary and I don't see how that is motivated by previous measurements (it seems unlikely to me that linear algorithms perform well at several dozens or hundreds of ranks). |
|
@devreal ICYMI, something was off with allgatherv too (I'd tested with the 4.1.x branch #8186 (comment)). Is that something you are seeing? |
|
@rajachan I have not yet looked at allgatherv. I can run some tests for that over night and see. Do remember at what scales things were weird? |
|
I was running with ~1K ranks (32 nodes with 36 ranks per node). |
|
Btw, your master PR is missing the allreduce/allgather commit. |
Oops, pushed to the wrong branch. Will fix in a minute |
Nice catch |
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 22e289b)
…lgather These selections seem harmful in my measurements and don't seem to be motivated by previous measurement data. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit a15e5dc)
0f89397 to
3cae9f7
Compare
This PR addresses a potential performance issue with the algorithm selection in coll/tuned and some minor issues found while digging into it:
Backport of #8186 to v4.1.x